AITopics | nominal data

Collaborating Authors

nominal data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?

Dawson, Charles, Tran, Van, Li, Max Z., Fan, Chuchu

arXiv.org Machine LearningFeb-28-2025

Increased deployment of autonomous systems in fields like transportation and robotics have seen a corresponding increase in safety-critical failures. These failures can be difficult to model and debug due to the relative lack of data: compared to tens of thousands of examples from normal operations, we may have only seconds of data leading up to the failure. This scarcity makes it challenging to train generative models of rare failure events, as existing methods risk either overfitting to noise in the limited failure dataset or underfitting due to an overly strong prior. We address this challenge with CalNF, or calibrated normalizing flows, a self-regularized framework for posterior learning from limited data. CalNF achieves state-of-the-art performance on data-limited failure modeling and inverse problems and enables a first-of-a-kind case study into the root causes of the 2022 Southwest Airlines scheduling crisis.

conference paper, dataset, posterior, (15 more...)

arXiv.org Machine Learning

2502.2111

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Michigan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services > Airport (1.00)
Transportation > Air (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

New Approach to Clustering Random Attributes

Gniazdowski, Zenon

arXiv.org Artificial IntelligenceDec-12-2024

This paper proposes a new method for similarity analysis and, consequently, a new algorithm for clustering different types of random attributes, both numerical and nominal. However, in order for nominal attributes to be clustered, their values must be properly encoded. In the encoding process, nominal attributes obtain a new representation in numerical form. Only the numeric attributes can be subjected to factor analysis, which allows them to be clustered in terms of their similarity to factors. The proposed method was tested for several sample datasets. It was found that the proposed method is universal. On the one hand, the method allows clustering of numerical attributes. On the other hand, it provides the ability to cluster nominal attributes. It also allows simultaneous clustering of numerical attributes and numerically encoded nominal attributes.

artificial intelligence, machine learning, variance, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.26348/znwwsi.31.41

2412.09748

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

On the Analysis of Correlation Between Nominal Data and Numerical Data

Gniazdowski, Zenon

arXiv.org Artificial IntelligenceFeb-3-2023

The article investigates the possibility of measuring the strength of a linear correlation relationship between nominal data and numerical data. Correlation coefficients for variables coded with real numbers as well as for variables coded with complex numbers were studied. For variables coded with real numbers, unambiguous measures of real linear correlation were obtained. In the case of complex coding, it has been observed that the obtained complex correlation coefficients change with the permutation of the phases in the complex numbers used to code classes of elements with equal cardinalities. It was found that a necessary condition for linear correlation is the possibility of linear ordering of a set with data. Since linear order is not possible in the set of complex numbers, complex correlation coefficients cannot be used as a measure of linear correlation. In the event of such a situation, a substitute action was suggested that would prevent equal cardinality of classes of identical elements contained in the set with nominal data. This action would consist in the correction of data, analogous to the correction during preprocessing or cleaning of data containing missing or outlier values.

artificial intelligence, correlation coefficient, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.26348/znwwsi.27.57

2302.02007

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Poland > Lublin Province > Lublin (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Understanding The Data Types For Machine Learning And Data Science - MarkTechPost

#artificialintelligenceNov-29-2022, 00:45:32 GMT

Machine learning (a subfield of AI) aims to program computers to learn and grow as people do. Machine learning may automate virtually any activity that can be solved using a pattern or set of data-developed rules. It's crucial to have a firm grasp of the various data kinds to clean and preprocess the data in preparation for use with ML algorithms. For machines to recognize patterns in data, it must first be translated into a numerical representation. This will allow us to pick the top-performing models that can quickly and accurately identify the underlying patterns.

categorical data, information, machine learning and data science, (12 more...)

#artificialintelligence

Industry:

Banking & Finance (0.49)
Media (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Monte Carlo EM for Deep Time Series Anomaly Detection

Aubet, François-Xavier, Zügner, Daniel, Gasthaus, Jan

arXiv.org Machine LearningDec-29-2021

Time series data are often corrupted by outliers or other kinds of anomalies. Identifying the anomalous points can be a goal on its own (anomaly detection), or a means to improving performance of other time series tasks (e.g. forecasting). Recent deep-learning-based approaches to anomaly detection and forecasting commonly assume that the proportion of anomalies in the training data is small enough to ignore, and treat the unlabeled data as coming from the nominal data distribution. We present a simple yet effective technique for augmenting existing time series models so that they explicitly account for anomalies in the training data. By augmenting the training data with a latent anomaly indicator variable whose distribution is inferred while training the underlying model using Monte Carlo EM, our method simultaneously infers anomalous points while improving model performance on nominal data. We demonstrate the effectiveness of the approach by combining it with a simple feed-forward forecasting model. We investigate how anomalies in the train set affect the training of forecasting models, which are commonly used for time series anomaly detection, and show that our method improves the training of the model.

anomaly, anomaly detection, time sery, (12 more...)

arXiv.org Machine Learning

2112.14436

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Flow-based anomaly detection

Maziarka, Łukasz, Śmieja, Marek, Sendera, Marcin, Struski, Łukasz, Tabor, Jacek, Spurek, Przemysław

arXiv.org Machine LearningOct-6-2020

We propose OneFlow - a flow-based one-class classifier for anomaly (outliers) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located near to the decision boundary (behavior similar to the support vectors in SVM). The combination of flow models and Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds. Experiments show that the proposed model outperforms related methods on real-world anomaly detection problems.

data mining, detection, machine learning, (18 more...)

arXiv.org Machine Learning

2010.03002

Country: South America > Paraguay > Asunción > Asunción (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Data Types in Statistics Used for Machine Learning.

#artificialintelligenceSep-6-2020, 09:26:14 GMT

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics allows you to understand a subject much more deeply. To become a successful Data Scientist you must know our basics. Math and Stats are the building blocks of Machine Learning algorithms.

artificial intelligence, machine learning, statistics, (13 more...)

#artificialintelligence

Country: Asia > India (0.05)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Assumptions in Deep Anomaly Detection

Ruff, Lukas, Vandermeulen, Robert A., Franks, Billy Joe, Müller, Klaus-Robert, Kloft, Marius

arXiv.org Machine LearningMay-30-2020

Though anomaly detection (AD) can be viewed as a classification problem (nominal vs. anomalous) it is usually treated in an unsupervised manner since one typically does not have access to, or it is infeasible to utilize, a dataset that sufficiently characterizes what it means to be "anomalous." In this paper we present results demonstrating that this intuition surprisingly does not extend to deep AD on images. For a recent AD benchmark on ImageNet, classifiers trained to discern between normal samples and just a few (64) random natural images are able to outperform the current state of the art in deep AD. We find that this approach is also very effective at other common image AD benchmarks. Experimentally we discover that the multiscale structure of image data makes example anomalies exceptionally informative.

data mining, detection, machine learning, (17 more...)

arXiv.org Machine Learning

2006.00339

Country:

North America > United States > California (0.14)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Europe > Germany > Berlin (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Natural Language Processing

#artificialintelligenceAug-7-2019, 02:58:59 GMT

We have data all around us and there are of two forms of data namely; tabular and text. If you have good statistical tools tabular data has a lot to convey. But it is really hard to get something out of the text, especially the natural language spoken text. So what is natural language? We, humans, have very complex language and natural language is the true form of human language which is spoken/written with sincerity also surpassing grammatical rules.

artificial intelligence, natural language processing, prediction, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

An Euclidean Distance Based on Tensor Product Graph Diffusion Related Attribute Value Embedding for Nominal Data Clustering

Gu, Lei (Nanjing University of Posts and Telecommunications) | Zhou, Ningning (Nanjing University of Posts and Telecommunications) | Zhao, Yang (Nanjing Forestry University)

AAAI ConferencesFeb-8-2018

Not like numerical data clustering, nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is the attribute value embedding, namely, transforming each nominal attribute value into a numerical vector. This embedding method consists of four steps. In the first step, the weights, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute based on each object and its k nearest neighbors. In the second step, an intra-attribute value similarity matrix is created for each nominal attribute by using the attribute value's weights. In the third step, for each nominal attribute, we find another attribute with the maximal dependence on it, and build an inter-attribute value similarity matrix on the basis of the attribute value's weights related to these two attributes. In the last step, a diffusion matrix of each nominal attribute is constructed by the tensor product graph diffusion process, and this step can cause the acquired value embedding to contain simultaneously the intra- and inter-attribute value similarities information. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.

matrix, nominal data, tave, (11 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(2 more...)

Genre:

Workflow (0.68)
Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback